On Leveraging Crowdsourcing Techniques for Schema Matching Networks
نویسندگان
چکیده
As the number of publicly-available datasets are likely to grow, the demand of establishing the links between these datasets is also getting higher and higher. For creating such links we need to match their schemas. Moreover, for using these datasets in meaningful ways, one often needs to match not only two, but several schemas. This matching process establishes a (potentially large) set of attribute correspondences between multiple schemas that constitute a schema matching network. Various commercial and academic schema matching tools have been developed to support this task. However, as the matching is inherently uncertain, the heuristic techniques adopted by these tools give rise to results that are not completely correct. Thus, in practice, a post-matching human expert effort is needed to obtain a correct set of attribute correspondences. Addressing this problem, our paper demonstrates how to leverage crowdsourcing techniques to validate the generated correspondences. We design validation questions with contextual information that can effectively guide the crowd workers. We analyze how to reduce overall human effort needed for this validation task. Through theoretical and empirical results, we show that by harnessing natural constraints defined on top of the schema matching network, one can significantly reduce the necessary human work.
منابع مشابه
Reconciling Schema Matching Networks Through Crowdsourcing
Schema matching is the process of establishing correspondences between the attributes of database schemas for data integration purposes. Although several automatic schema matching tools have been developed, their results are often incomplete or erroneous. To obtain a correct set of correspondences, usually human effort is required to validate the generated correspondences. This validation proce...
متن کاملReducing Uncertainty of Schema Matching via Crowdsourcing
Schema matching is a central challenge for data integration systems. Automated tools are often uncertain about schema matchings they suggest, and this uncertainty is inherent since it arises from the inability of the schema to fully capture the semantics of the represented data. Human common sense can often help. Inspired by the popularity and the success of easily accessible crowdsourcing plat...
متن کاملAn Improved Semantic Schema Matching Approach
Schema matching is a critical step in many applications, such as data warehouse loading, Online Analytical Process (OLAP), Data mining, semantic web [2] and schema integration. This task is defined for finding the semantic correspondences between elements of two schemas. Recently, schema matching has found considerable interest in both research and practice. In this paper, we present a new impr...
متن کاملMatching and Grokking: Approaches to Personalized Crowdsourcing
Personalization aims to tailor content to a person’s individual tastes. As a result, the tasks that benefit from personalization are inherently subjective. Many of the most robust approaches to personalization rely on large sets of other people’s preferences. However, existing preference data is not always available. In these cases, we propose leveraging online crowds to provide on-demand perso...
متن کاملReconciling Schema Matching Networks
Schema matching is the process of establishing correspondences between the attributes of schemas, for the purpose of data integration. Schema matching is often performed in a pair-wise setting, in which two given schemas are matched again each other by automatic tools. In this thesis, we instead approach the schema matching problem in a network setting, in which the two schemas to be matched do...
متن کامل